Overview:

This page contains the results of CoNGA analyses. Results in tables may have been filtered to reduce redundancy, focus on the most important columns, and limit length; full tables should exist as OUTFILE_PREFIX*.tsv files.

Command:

scripts/run_conga.py --all --gex_data /scratch.global/ben_testing/ben_tcr/Pair_11_Emory/outs/filtered_feature_bc_matrix.h5 --gex_data_type 10x_h5 --clones_file emoryPair11Final.tsv --organism human --outfile_prefix emoryPair11Final

Stats

num_cells_w_gex: 13070
num_features_start: 26530
num_cells_w_tcr: 2225
min_genes_per_cell: 200
max_genes_per_cell: 3500
max_percent_mito: 0.1
num_filt_max_genes_per_cell: 1555
num_filt_max_percent_mito: 0
num_antibody_features: 0
num_TR_genes: 43
num_TR_genes_in_hvg_set: 41
num_highly_variable_genes: 2211
num_cells_after_filtering: 670
num_clonotypes: 496
max_clonotype_size: 21
num_singleton_clonotypes: 436
nbr_frac_for_nndists: 0.1
num_gvg_hit_clonotypes: 6
num_gvg_hit_biclusters: 0

graph_vs_graph_stats


Here we are assessing overall graph-vs-graph correlation by looking at the shared edges between TCR and GEX neighbor graphs and comparing that observed number to the number we would expect if the graphs were completely uncorrelated. Our null model for uncorrelated graphs is to take the vertices of one graph and randomly renumber them (permute their labels). We compare the observed overlap to that expected under this null model by computing a Z-score, either by permuting one of the graph's vertices many times to get a mean and standard deviation of the overlap distribution, or, for large graphs where this is time consuming, by using a regression model for the standard deviation. The different rows of this table correspond to the different graph-graph comparisons that we make in the conga graph-vs-graph analysis: we compare K-nearest-neighbor graphs for GEX and TCR at different K values ("nbr_frac" aka neighbor-fraction, which reports K as a fraction of the total number of clonotypes) to each other and to GEX and TCR "cluster" graphs in which each clonotype is connected to all the other clonotypes with the same (GEX or TCR) cluster assignment. For two K values (the default), this gives 2*3=6 comparisons: GEX KNN graph vs TCR KNN graph, GEX cluster graph vs TCR KNN graph, and GEX KNN graph vs TCR cluster graph, for each of the two K values (aka nbr_fracs).

The column to look at is *overlap_zscore*. Higher values indicate more significant GEX/TCR covariation, with "interesting" levels starting around zscores of 3-5.

Columns in more detail:

graph_overlap_type: KNN ("nbr") or cluster versus KNN ("nbr") or cluster

nbr_frac: the K value for the KNN graph, as a fraction of total clonotypes

overlap: the observed overlap (number of shared edges) between GEX and TCR graphs

expected_overlap: the expected overlap under a shuffled null model.

overlap_zscore: a Z-score for the observed overlap computed by subtracting the expected overlap and dividing by the standard deviation estimated from shuffling.
overlap expected_overlap overlap_mean overlap_sdev overlap_zscore overlap_zscore_fitted overlap_zscore_source nodes calculation_time calculation_time_fitted gex_edges tcr_edges gex_indegree_variance gex_indegree_skewness gex_indegree_kurtosis tcr_indegree_variance tcr_indegree_skewness tcr_indegree_kurtosis indegree_correlation_R indegree_correlation_P nbr_frac graph_overlap_type
22 16.032323 16.14 4.584801 1.278136 2.387600 shuffling 496 0.057114 0.000981 1984 1984 1.004293 1.830023 5.720712 0.419192 1.283199 3.219440 0.022573 0.616000 0.01 gex_nbr_vs_tcr_nbr
211 212.719192 213.18 14.794175 -0.147355 -0.178395 shuffling 496 0.202792 0.014598 1984 26324 1.004293 1.830023 5.720712 0.074457 -0.249930 -1.183560 0.069705 0.121054 0.01 gex_nbr_vs_tcr_cluster
296 267.490909 270.27 18.411874 1.397468 2.589448 shuffling 496 0.247460 0.018544 33102 1984 0.161475 0.059856 -1.029046 0.419192 1.283199 3.219440 -0.013584 0.762823 0.01 gex_cluster_vs_tcr_nbr
2436 2405.850505 2405.19 68.599081 0.449131 0.453443 shuffling 496 0.238049 0.148419 24304 24304 0.770041 1.476363 2.258068 0.264748 1.775346 5.449563 -0.108070 0.016048 0.10 gex_nbr_vs_tcr_nbr
2709 2605.810101 2599.38 58.558822 1.871964 2.025644 shuffling 496 0.242820 0.161326 24304 26324 0.770041 1.476363 2.258068 0.074457 -0.249930 -1.183560 -0.018191 0.686112 0.10 gex_nbr_vs_tcr_cluster
3341 3276.763636 3286.39 71.698242 0.761664 1.159376 shuffling 496 0.284870 0.204938 33102 24304 0.161475 0.059856 -1.029046 0.264748 1.775346 5.449563 -0.058544 0.193033 0.10 gex_cluster_vs_tcr_nbr

graph_vs_graph


Graph vs graph analysis looks for correlation between GEX and TCR space by finding statistically significant overlap between two similarity graphs, one defined by GEX similarity and one by TCR sequence similarity.

Overlap is defined one node (clonotype) at a time by looking for overlap between that node's neighbors in the GEX graph and its neighbors in the TCR graph. The null model is that the two neighbor sets are chosen independently at random.

CoNGA looks at two kinds of graphs: K nearest neighbor (KNN) graphs, where K = neighborhood size is specified as a fraction of the number of clonotypes (defaults for K are 0.01 and 0.1), and cluster graphs, where each clonotype is connected to all the other clonotypes in the same (GEX or TCR) cluster. Overlaps are computed 3 ways (GEX KNN vs TCR KNN, GEX KNN vs TCR cluster, and GEX cluster vs TCR KNN), for each of the K values (called nbr_fracs short for neighbor fractions).

Columns (depend slightly on whether hit is KNN v KNN or KNN v cluster): conga_score = P value for GEX/TCR overlap * number of clonotypes mait_fraction = fraction of the overlap made up of 'invariant' T cells num_neighbors* = size of neighborhood (K) cluster_size = size of cluster (for KNN v cluster graph overlaps) clone_index = 0-index of clonotype in adata object


conga_score num_neighbors_gex num_neighbors_tcr overlap overlap_corrected mait_fraction clone_index nbr_frac graph_overlap_type cluster_size gex_cluster tcr_cluster va ja cdr3a vb jb cdr3b
0.135374 49 NaN 12 12 0.0 249 0.10 gex_nbr_vs_tcr_cluster 42.0 0 5 TRAV21*01 TRAJ52*01 CAAPGAGGAGYGKLTF TRBV20-1*01 TRBJ2-5*01 CSASGTLQETQYF
0.145254 4 4.0 2 2 0.0 181 0.01 gex_nbr_vs_tcr_nbr NaN 2 1 TRAV18*01 TRAJ50*01 CVLRDRASYNKLMF TRBV27*01 TRBJ1-5*01 CASSLAGDSNQPQYF
0.163098 49 49.0 13 13 0.0 176 0.10 gex_nbr_vs_tcr_nbr NaN 6 1 TRAV18*01 TRAJ41*01 CVLGRSSSNSGYALNF TRBV6-2*01 TRBJ1-1*01 CASRDRILTEAFF
0.190294 49 NaN 16 16 0.0 401 0.10 gex_nbr_vs_tcr_cluster 70.0 6 1 TRAV8-2*01 TRAJ36*01 CAVKQTGVNNLFF TRBV4-3*01 TRBJ1-2*01 CASSQVYLFGGDDYTF
0.663982 49 NaN 15 15 0.0 214 0.10 gex_nbr_vs_tcr_cluster 70.0 1 1 TRAV2*01 TRAJ4*01 CAVEPGGYDKLIF TRBV14*01 TRBJ1-5*01 CASSQEGGLNQPQYF
0.663982 49 NaN 15 15 0.0 437 0.10 gex_nbr_vs_tcr_cluster 70.0 4 1 TRAV8-4*01 TRAJ50*01 CAAGPFVTYNKLMF TRBV6-3*01 TRBJ2-2*01 CASSYSGAAQLFF

tcr_clumping


This table stores the results of the TCR "clumping" analysis, which looks for neighborhoods in TCR space with more TCRs than expected by chance under a simple null model of VDJ rearrangement.

For each TCR in the dataset, we count how many TCRs are within a set of fixed TCRdist radii (defaults: 24,48,72,96), and compare that number to the expected number given the size of the dataset using the poisson model. Inspired by the ALICE and TCRnet methods.

Columns: clump_type='global' unless we are optionally looking for TCR clumps within the individual GEX clusters num_nbrs = neighborhood size (number of other TCRs with TCRdist

clump_type clone_index nbr_radius pvalue_adj num_nbrs expected_num_nbrs raw_count va ja cdr3a vb jb cdr3b clonotype_fdr_value clumping_group clusters_gex clusters_tcr
global 352 24 0.016892 1 0.000009 43.0 TRAV41*01 TRAJ33*01 CAVDSNYQLIW TRBV20-1*01 TRBJ1-4*01 CSARDRDTNEKLFF 0.008446 1 0 5
global 353 24 0.016892 1 0.000009 43.0 TRAV41*01 TRAJ33*01 CAVDSNYQLIW TRBV20-1*01 TRBJ1-4*01 CSARDRDTNEKLFF 0.008446 1 2 5
global 352 48 0.226258 1 0.000114 576.0 TRAV41*01 TRAJ33*01 CAVDSNYQLIW TRBV20-1*01 TRBJ1-4*01 CSARDRDTNEKLFF 0.008446 1 0 5
global 353 48 0.226258 1 0.000114 576.0 TRAV41*01 TRAJ33*01 CAVDSNYQLIW TRBV20-1*01 TRBJ1-4*01 CSARDRDTNEKLFF 0.008446 1 2 5
global 359 48 0.382189 1 0.000193 973.0 TRAV41*01 TRAJ50*01 CAVYYNKLMF TRBV4-3*01 TRBJ1-4*01 CASSQDRTGGEKLFF 0.076438 2 1 4

tcr_db_match


This table stores significant matches between TCRs in adata and TCRs in the file /scratch.global/ben_testing/conga/conga/data/new_paired_tcr_db_for_matching_nr.tsv

P values of matches are assigned by turning the raw TCRdist score into a P value based on a model of the V(D)J rearrangement process, so matches between TCRs that are very far from germline (for example) are assigned a higher significance.

Columns:

tcrdist: TCRdist distance between the two TCRs (adata query and db hit)

pvalue_adj: raw P value of the match * num query TCRs * num db TCRs

fdr_value: Benjamini-Hochberg FDR value for match

clone_index: index within adata of the query TCR clonotype

db_index: index of the hit in the database being matched

va,ja,cdr3a,vb,jb,cdr3b

db_XXX: where XXX is a field in the literature database



tcr_graph_vs_gex_features


This table has results from a graph-vs-features analysis in which we look for genes that are differentially expressed (elevated) in specific neighborhoods of the TCR neighbor graph. Differential expression is assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a gene.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons log2enr = log2 fold change of gene in neighborhood (will be positive) gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the gene mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.


ttest_pvalue_adj mwu_pvalue_adj log2enr gex_cluster tcr_cluster feature mean_fg mean_bg num_fg clone_index mait_fraction nbr_frac graph_type feature_type
1.176625e-22 1.763414e-51 7.052102 0 5 ENSMMUG00000043894 3.116469 0.150584 43 -1 0.0 0.0 tcr_cluster gex
4.685757e-06 1.252591e-26 5.259000 0 5 ENSMMUG00000043894 2.253066 0.200828 50 425 0.0 0.1 tcr_nbr gex
1.087104e-05 4.463841e-25 5.243075 2 5 ENSMMUG00000043894 2.246696 0.201542 50 160 0.0 0.1 tcr_nbr gex
2.770734e-05 4.032554e-24 5.034332 0 5 ENSMMUG00000043894 2.162849 0.210942 50 29 0.0 0.1 tcr_nbr gex
2.190950e-05 6.529895e-23 5.141585 0 5 ENSMMUG00000043894 2.206009 0.206103 50 213 0.0 0.1 tcr_nbr gex
1.084771e-04 4.338609e-21 5.019883 0 5 ENSMMUG00000043894 2.157023 0.211595 50 203 0.0 0.1 tcr_nbr gex
5.373957e-04 1.955217e-20 4.793971 0 5 ENSMMUG00000043894 2.065626 0.221841 50 111 0.0 0.1 tcr_nbr gex
1.359607e-03 5.402948e-20 4.630569 5 5 ENSMMUG00000043894 1.999237 0.229284 50 402 0.0 0.1 tcr_nbr gex
4.017466e-04 4.052746e-19 4.846244 2 5 ENSMMUG00000043894 2.086820 0.219465 50 78 0.0 0.1 tcr_nbr gex
4.539481e-04 7.759940e-19 4.827226 2 5 ENSMMUG00000043894 2.079112 0.220330 50 140 0.0 0.1 tcr_nbr gex
4.835479e-04 7.893327e-19 4.818860 2 5 ENSMMUG00000043894 2.075720 0.220710 50 81 0.0 0.1 tcr_nbr gex
5.037712e-04 8.595118e-19 4.811466 0 5 ENSMMUG00000043894 2.072722 0.221046 50 255 0.0 0.1 tcr_nbr gex
2.215590e-03 1.207632e-18 4.658240 0 5 ENSMMUG00000043894 2.010492 0.228022 50 52 0.0 0.1 tcr_nbr gex
2.234398e-03 5.731625e-17 4.656072 0 5 ENSMMUG00000043894 2.009610 0.228121 50 74 0.0 0.1 tcr_nbr gex
4.915064e-03 8.908936e-17 4.561954 0 5 ENSMMUG00000043894 1.971311 0.232415 50 156 0.0 0.1 tcr_nbr gex
5.749920e-03 1.880442e-16 4.516485 2 5 ENSMMUG00000043894 1.952793 0.234491 50 23 0.0 0.1 tcr_nbr gex
6.742742e-03 1.670471e-15 4.563627 0 5 ENSMMUG00000043894 1.971992 0.232339 50 313 0.0 0.1 tcr_nbr gex
3.656652e-02 4.035340e-15 4.347598 2 5 ENSMMUG00000043894 1.883967 0.242207 50 249 0.0 0.1 tcr_nbr gex
1.660635e-02 6.252678e-15 4.377533 0 5 ENSMMUG00000043894 1.896169 0.240839 50 8 0.0 0.1 tcr_nbr gex
3.601341e-02 9.973538e-15 4.315171 0 5 ENSMMUG00000043894 1.870748 0.243689 50 129 0.0 0.1 tcr_nbr gex
3.009212e-02 2.369439e-14 4.243301 2 5 ENSMMUG00000043894 1.841454 0.246973 50 307 0.0 0.1 tcr_nbr gex
2.855111e-02 3.333033e-13 4.294776 2 5 ENSMMUG00000043894 1.862435 0.244621 50 352 0.0 0.1 tcr_nbr gex
2.855111e-02 3.333033e-13 4.294776 2 5 ENSMMUG00000043894 1.862435 0.244621 50 353 0.0 0.1 tcr_nbr gex
1.454555e-01 6.903124e-13 4.097158 0 5 ENSMMUG00000043894 1.781923 0.253647 50 416 0.0 0.1 tcr_nbr gex
1.489973e-01 7.006006e-13 4.093609 2 5 ENSMMUG00000043894 1.780479 0.253809 50 439 0.0 0.1 tcr_nbr gex
9.835122e-02 8.522497e-12 4.212458 0 5 ENSMMUG00000043894 1.828885 0.248382 50 253 0.0 0.1 tcr_nbr gex
1.709787e-01 1.184556e-11 4.115429 2 5 ENSMMUG00000043894 1.789362 0.252813 50 492 0.0 0.1 tcr_nbr gex
4.527795e-01 5.852798e-11 3.918442 5 5 ENSMMUG00000043894 1.709270 0.261792 50 59 0.0 0.1 tcr_nbr gex
2.968522e-01 1.625121e-10 4.092760 0 5 ENSMMUG00000043894 1.780133 0.253847 50 452 0.0 0.1 tcr_nbr gex
3.073661e-01 3.701793e-10 4.033128 2 5 ENSMMUG00000043894 1.755870 0.256567 50 34 0.0 0.1 tcr_nbr gex
1.493935e-03 4.165555e-10 3.829200 5 8 ENSMMUG00000056515 2.143218 0.424974 50 176 0.0 0.1 tcr_nbr gex
6.597906e-01 4.242324e-10 3.959445 5 5 ENSMMUG00000043894 1.725919 0.259925 50 380 0.0 0.1 tcr_nbr gex
6.687360e-01 4.794901e-10 3.939188 0 5 ENSMMUG00000043894 1.717692 0.260847 50 125 0.0 0.1 tcr_nbr gex
8.011512e-02 7.309025e-10 3.305261 0 3 ENSMMUG00000056515 1.896046 0.452684 50 274 0.0 0.1 tcr_nbr gex
5.263138e-01 1.746978e-09 3.875111 0 5 ENSMMUG00000043894 1.691690 0.263762 50 218 0.0 0.1 tcr_nbr gex
2.786468e+00 4.655902e-09 3.579095 2 5 ENSMMUG00000043894 1.572156 0.277163 50 483 0.0 0.1 tcr_nbr gex
1.617443e+00 1.885459e-08 3.752812 0 5 ENSMMUG00000043894 1.642174 0.269314 50 191 0.0 0.1 tcr_nbr gex
4.939308e+00 6.741138e-08 3.557013 2 5 ENSMMUG00000043894 1.563288 0.278157 50 42 0.0 0.1 tcr_nbr gex
2.996390e+00 2.011981e-07 3.751065 5 5 ENSMMUG00000043894 1.641468 0.269393 50 421 0.0 0.1 tcr_nbr gex
3.001085e+00 2.482482e-07 3.745496 0 5 ENSMMUG00000043894 1.639217 0.269645 50 18 0.0 0.1 tcr_nbr gex
4.745798e+00 5.505835e-07 3.573640 0 5 ENSMMUG00000043894 1.569965 0.277409 50 457 0.0 0.1 tcr_nbr gex
5.163879e+00 5.780375e-07 3.619675 0 5 ENSMMUG00000043894 1.588474 0.275334 50 65 0.0 0.1 tcr_nbr gex
6.319222e-01 4.080144e-06 3.103270 4 3 ENSMMUG00000056515 1.802207 0.463204 50 259 0.0 0.1 tcr_nbr gex
7.542347e+00 4.323079e-06 3.615010 0 5 ENSMMUG00000043894 1.586597 0.275544 50 422 0.0 0.1 tcr_nbr gex
1.574875e+00 4.983006e-06 3.069843 0 0 ENSMMUG00000056515 1.786774 0.464934 50 16 0.0 0.1 tcr_nbr gex
5.149386e-01 5.745158e-06 3.264272 6 1 ENSMMUG00000056515 1.876927 0.454827 50 148 0.0 0.1 tcr_nbr gex
1.275155e+00 7.276922e-06 2.994163 0 0 ENSMMUG00000056515 1.751941 0.468839 50 258 0.0 0.1 tcr_nbr gex
8.582253e-01 1.539187e-05 3.157389 4 3 ENSMMUG00000056515 1.827253 0.460396 50 208 0.0 0.1 tcr_nbr gex
6.437223e-01 2.992497e-05 3.228311 0 0 ENSMMUG00000056515 1.860185 0.456704 50 107 0.0 0.1 tcr_nbr gex
4.189580e+00 3.660095e-05 2.818277 0 3 ENSMMUG00000056515 1.671615 0.477845 50 275 0.0 0.1 tcr_nbr gex
Omitted 13 lines

tcr_graph_vs_gex_features_plot


This plot summarizes the results of a graph versus features analysis by labeling the clonotypes at the center of each biased neighborhood with the name of the feature biased in that neighborhood. The feature names are drawn in colored boxes whose color is determined by the strength and direction of the feature score bias (from bright red for features that are strongly elevated to bright blue for features that are strongly decreased in the corresponding neighborhoods, relative to the rest of the dataset).

At most one feature (the top scoring) is shown for each clonotype (ie, neighborhood). The UMAP xy coordinates for this plot are stored in adata.obsm['X_tcr_2d']. The score used for ranking correlations is 'mwu_pvalue_adj'. The threshold score for displaying a feature is 1.0. The feature column is 'feature'. Since we also run graph-vs-features using "neighbor" graphs that are defined by clusters, ie where each clonotype is connected to all the other clonotypes in the same cluster, some biased features may be associated with a cluster rather than a specific clonotype. Those features are labeled with a '*' at the end and shown near the centroid of the clonotypes belonging to that cluster.
Image source: emoryPair11Final_tcr_graph_vs_gex_features_plot.png

tcr_graph_vs_gex_features_panels


Graph-versus-feature analysis was used to identify a set of GEX features that showed biased distributions in TCR neighborhoods. This plot shows the distribution of the top-scoring GEX features on the TCR UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: emoryPair11Final_tcr_graph_vs_gex_features_panels.png

tcr_genes_vs_gex_features


This table has results from a graph-vs-features analysis in which we look for genes that are differentially expressed (elevated) in specific neighborhoods of the TCR neighbor graph. Differential expression is assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a gene.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons log2enr = log2 fold change of gene in neighborhood (will be positive) gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the gene mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.

In this analysis the TCR graph is defined by connecting all clonotypes that have the same VA/JA/VB/JB-gene segment (it's run four times, once with each gene segment type)
ttest_pvalue_adj mwu_pvalue_adj log2enr gex_cluster tcr_cluster feature mean_fg mean_bg num_fg clone_index mait_fraction gene_segment graph_type feature_type
8.385425e-01 4.129355e-83 10.102635 1 2 ENSMMUG00000056431 1.723031 0.004176 12 -1 0.0 TRAV35 tcr_genes gex
1.010029e-23 3.634089e-79 10.344209 0 3 ENSMMUG00000063185 3.494982 0.024281 31 -1 0.0 TRBV4-2 tcr_genes gex
5.384392e-01 1.700150e-72 9.028595 5 2 ENSMMUG00000059325 1.625720 0.007786 16 -1 0.0 TRAV25 tcr_genes gex
9.093194e-03 3.825079e-70 9.144055 5 1 ENSMMUG00000062211 2.458646 0.018717 17 -1 0.0 TRBV12-2 tcr_genes gex
2.697869e+00 4.603826e-60 8.295272 7 7 ENSMMUG00000056910 1.908267 0.018111 10 -1 0.0 TRAV16 tcr_genes gex
3.908276e-02 5.865665e-60 8.879587 5 1 ENSMMUG00000060662 2.512758 0.023789 10 -1 0.0 TRAV8-7 tcr_genes gex
1.192059e-21 3.603925e-56 8.117888 0 0 ENSMMUG00000062085 3.030515 0.068540 34 -1 0.0 TRBV4-3 tcr_genes gex
1.339802e-41 8.405509e-56 7.358897 0 5 ENSMMUG00000043894 3.267886 0.143109 42 -1 0.0 TRBV20-1 tcr_genes gex
6.785412e-01 3.781719e-54 9.250202 4 4 ENSMMUG00000054409 2.143225 0.012284 13 -1 0.0 TRAV6 tcr_genes gex
5.808227e-01 4.596215e-54 7.211970 1 1 ENSMMUG00000061081 1.064753 0.012735 19 -1 0.0 TRAV8-2 tcr_genes gex
4.261443e-06 5.419144e-52 8.408435 5 0 ENSMMUG00000065017 2.439549 0.030343 20 -1 0.0 TRAV12-1 tcr_genes gex
2.713978e+00 2.841522e-25 6.809848 5 1 ENSMMUG00000061119 1.832875 0.045748 16 -1 0.0 TRAV18 tcr_genes gex
1.453674e-06 3.751946e-20 6.988104 0 0 ENSMMUG00000051385 3.012414 0.141774 11 -1 0.0 TRBV7-4 tcr_genes gex
7.944003e-10 6.060995e-19 5.042675 0 0 ENSMMUG00000056515 2.958106 0.440856 31 -1 0.0 TRBV6-2 tcr_genes gex
2.889650e-14 2.144523e-18 5.463341 0 1 ENSMMUG00000056515 3.262967 0.450771 26 -1 0.0 TRBV6-3 tcr_genes gex
3.386548e-01 1.424218e-06 4.432549 6 2 ENSMMUG00000043894 2.360039 0.367535 10 -1 0.0 TRBV19 tcr_genes gex
1.412165e+00 1.704765e-06 5.041569 1 4 ENSMMUG00000043894 2.740171 0.364602 9 -1 0.0 TRBV21-1 tcr_genes gex
2.460468e-03 2.036946e-04 4.364812 4 1 ENSMMUG00000056515 2.766862 0.544415 12 -1 0.0 TRBV10-2 tcr_genes gex
3.417063e-03 1.467406e-01 1.124226 4 7 ENSMMUG00000059019 1.682992 1.101962 46 -1 0.0 TRBJ1-2 tcr_genes gex
2.055242e-01 4.459292e-01 2.586831 7 0 CST3 0.548076 0.114664 5 -1 0.0 TRBV6-8 tcr_genes gex

tcr_genes_vs_gex_features_panels


Graph-versus-feature analysis was used to identify a set of GEX features that showed biased distributions in TCR neighborhoods. This plot shows the distribution of the top-scoring GEX features on the TCR UMAP 2D landscape. The features are ranked by 'mwu_pvalue_adj' ie Mann-Whitney-Wilcoxon adjusted P value (raw P value * number of comparisons). At most 3 features from clonotype neighbhorhoods in each (GEX,TCR) cluster pair are shown. The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel. Points are plotted in order of increasing feature score.
Image source: emoryPair11Final_tcr_genes_vs_gex_features_panels.png

gex_graph_vs_tcr_features


This table has results from a graph-vs-features analysis in which we look at the distribution of a set of TCR-defined features over the GEX neighbor graph. We look for neighborhoods in the graph that have biased score distributions, as assessed by a ttest first, for speed, and then by a mannwhitneyu test for nbrhood/score combinations whose ttest P-value passes an initial threshold (default is 10* the pvalue threshold).

Each row of the table represents a single significant association, in other words a neighborhood (defined by the central clonotype index) and a tcr feature.

The columns are as follows:

ttest_pvalue_adj= ttest_pvalue * number of comparisons ttest_stat= ttest statistic (sign indicates where feature is up or down) mwu_pvalue_adj= mannwhitney-U P-value * number of comparisons gex_cluster= the consensus GEX cluster of the clonotypes w/ biased scores tcr_cluster= the consensus TCR cluster of the clonotypes w/ biased scores num_fg= the number of clonotypes in the neighborhood (including center) mean_fg= the mean value of the feature in the neighborhood mean_bg= the mean value of the feature outside the neighborhood feature= the name of the TCR score mait_fraction= the fraction of the skewed clonotypes that have an invariant TCR clone_index= the index in the anndata dataset of the clonotype that is the center of the neighborhood.


nbr_frac graph_type ttest_pvalue_adj ttest_stat mwu_pvalue_adj gex_cluster tcr_cluster num_fg mean_fg mean_bg feature mait_fraction clone_index feature_type
0.0 gex_cluster 0.840617 -3.543602 0.264194 2.0 5.0 70.0 -1.803266 -1.185967 imhc 0.0 -1.0 tcr
0.0 gex_cluster 1.525273 -3.591452 0.308921 6.0 4.0 28.0 0.776978 1.038857 disorder 0.0 -1.0 tcr
0.0 gex_cluster 0.655427 3.900773 0.309534 6.0 1.0 28.0 1.971037 1.893739 beta 0.0 -1.0 tcr
0.1 gex_nbr 0.330157 -5.014420 0.398259 7.0 1.0 50.0 -0.322845 -0.002894 cd8 0.0 269.0 tcr
0.0 gex_cluster 3.135113 -3.330630 0.603744 6.0 4.0 28.0 -5.775247 -5.491135 mjenergy 0.0 -1.0 tcr
0.1 gex_nbr 0.810211 -4.787417 0.691913 0.0 5.0 50.0 -0.327341 -0.002390 cd8 0.0 97.0 tcr
0.0 gex_cluster 0.936262 3.545102 0.708752 3.0 3.0 54.0 -0.131740 -0.328623 kf5 0.0 -1.0 tcr
0.0 gex_cluster 0.863751 3.597980 0.736687 4.0 3.0 52.0 -0.607811 -1.351001 imhc 0.0 -1.0 tcr
0.1 gex_nbr 0.542962 -4.875506 0.795582 0.0 5.0 50.0 -0.310133 -0.004319 cd8 0.0 473.0 tcr
0.1 gex_nbr 0.280316 -4.979091 4.841975 0.0 5.0 50.0 -0.265452 -0.009328 cd8 0.0 444.0 tcr

gex_graph_vs_tcr_features_plot


This plot summarizes the results of a graph versus features analysis by labeling the clonotypes at the center of each biased neighborhood with the name of the feature biased in that neighborhood. The feature names are drawn in colored boxes whose color is determined by the strength and direction of the feature score bias (from bright red for features that are strongly elevated to bright blue for features that are strongly decreased in the corresponding neighborhoods, relative to the rest of the dataset).

At most one feature (the top scoring) is shown for each clonotype (ie, neighborhood). The UMAP xy coordinates for this plot are stored in adata.obsm['X_gex_2d']. The score used for ranking correlations is 'mwu_pvalue_adj'. The threshold score for displaying a feature is 1.0. The feature column is 'feature'. Since we also run graph-vs-features using "neighbor" graphs that are defined by clusters, ie where each clonotype is connected to all the other clonotypes in the same cluster, some biased features may be associated with a cluster rather than a specific clonotype. Those features are labeled with a '*' at the end and shown near the centroid of the clonotypes belonging to that cluster.
Image source: emoryPair11Final_gex_graph_vs_tcr_features_plot.png

gex_graph_vs_tcr_features_panels


Image source: emoryPair11Final_gex_graph_vs_tcr_features_panels.png
ERROR -- missing image {pngfile}

graph_vs_features_gex_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the GEX landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_gex' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are GEX clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=49 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie GEX features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the GEX features).


Image source: emoryPair11Final_graph_vs_features_gex_clustermap.png

graph_vs_features_tcr_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the TCR landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_tcr' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are TCR clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=49 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie TCR features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the TCR features).


Image source: emoryPair11Final_graph_vs_features_tcr_clustermap.png

graph_vs_summary


Summary figure for the graph-vs-graph and graph-vs-features analyses.
Image source: emoryPair11Final_graph_vs_summary.png

gex_clusters_tcrdist_trees


These are TCRdist hierarchical clustering trees for the GEX clusters (cluster assignments stored in adata.obs['clusters_gex']). The trees are colored by CoNGA score with a color score range of 4.96e+00 (blue) to 4.96e-09 (red). For coloring, CoNGA scores are log-transformed, negated, and square-rooted (with an offset in there, too, roughly speaking).
Image source: emoryPair11Final_gex_clusters_tcrdist_trees.png

conga_threshold_tcrdist_tree


This is a TCRdist hierarchical clustering tree for the clonotypes with CoNGA score less than 10.0. The tree is colored by CoNGA score with a color score range of 1.00e+01 (blue) to 1.00e-08 (red). For coloring, CoNGA scores are log-transformed, negated, and square-rooted (with an offset in there, too, roughly speaking).
Image source: emoryPair11Final_conga_threshold_tcrdist_tree.png

hotspot_features


Find GEX (TCR) features that show a biased distribution across the TCR (GEX) neighbor graph, using a simplified version of the Hotspot method from the Yosef lab.

DeTomaso, D., & Yosef, N. (2021). "Hotspot identifies informative gene modules across modalities of single-cell genomics." Cell Systems, 12(5), 446–456.e9.

PMID:33951459

Columns:

Z: HotSpot Z statistic

pvalue_adj: Raw P value times the number of tests (crude Bonferroni correction)

nbr_frac: The K NN nbr fraction used for the neighbor graph construction (nbr_frac = 0.1 means K=0.1*num_clonotypes neighbors)


Z pvalue_adj feature feature_type nbr_frac
45.040090 0.000000e+00 ENSMMUG00000043894 gex 0.10
29.885820 2.323074e-192 ENSMMUG00000056515 gex 0.10
21.273270 1.550664e-96 ENSMMUG00000043894 gex 0.01
20.239359 3.404804e-87 ENSMMUG00000063185 gex 0.10
18.675301 6.073724e-74 ENSMMUG00000062085 gex 0.10
18.015172 1.143937e-68 ENSMMUG00000059325 gex 0.10
16.124270 1.330549e-54 ENSMMUG00000059325 gex 0.01
16.087118 2.425968e-54 ENSMMUG00000056431 gex 0.10
15.807421 2.135876e-52 ENSMMUG00000054409 gex 0.01
15.736950 6.519450e-52 ENSMMUG00000065017 gex 0.10
15.475130 3.944108e-50 ENSMMUG00000054409 gex 0.10
15.196598 2.876573e-48 ENSMMUG00000056515 gex 0.01
14.791922 1.275595e-45 ENSMMUG00000061081 gex 0.10
13.468720 1.845581e-37 ENSMMUG00000061119 gex 0.10
13.459147 2.100942e-37 ENSMMUG00000048246 gex 0.01
13.320344 1.361518e-36 ENSMMUG00000061081 gex 0.01
12.588912 1.876685e-32 ENSMMUG00000061119 gex 0.01
12.016336 2.252171e-29 ENSMMUG00000056431 gex 0.01
11.728704 7.015476e-28 ENSMMUG00000065017 gex 0.01
9.651768 3.731394e-18 ENSMMUG00000056910 gex 0.10
9.286605 1.230178e-16 ENSMMUG00000057062 gex 0.10
8.832353 7.915783e-15 ENSMMUG00000056910 gex 0.01
8.756327 1.557847e-14 ENSMMUG00000060662 gex 0.10
8.706822 2.413548e-14 ENSMMUG00000062211 gex 0.10
8.593218 6.531099e-14 PKHD1L1 gex 0.01
8.501788 1.441824e-13 HCN3 gex 0.01
8.336248 5.922382e-13 TMC4 gex 0.01
8.291139 8.662834e-13 ENSMMUG00000003532 gex 0.10
8.064314 5.687864e-12 ENSMMUG00000049680 gex 0.01
7.044762 1.495524e-10 cd8 tcr 0.10
6.916199 3.582881e-08 ENSMMUG00000061255 gex 0.01
6.701394 1.594342e-07 ENSMMUG00000060662 gex 0.01
6.608458 2.999171e-07 ENSMMUG00000063185 gex 0.01
6.568198 3.933145e-07 ENSMMUG00000051385 gex 0.10
6.473282 7.406116e-07 CD8A gex 0.10
6.037337 1.210023e-05 ENSMMUG00000062085 gex 0.01
5.968999 1.843590e-05 VAT1L gex 0.01
5.950222 2.068053e-05 EPHA1 gex 0.01
5.839269 4.049191e-05 RBMS2 gex 0.01
5.779115 5.799540e-05 ENSMMUG00000062211 gex 0.01
5.733677 7.589908e-05 ENSMMUG00000051857 gex 0.01
5.651851 1.225866e-04 ENSMMUG00000051385 gex 0.01
5.634538 1.355601e-04 ENSMMUG00000056196 gex 0.01
5.608222 1.578700e-04 TBX19 gex 0.01
5.566041 2.012555e-04 HOPX gex 0.10
5.499141 2.947459e-04 ENSMMUG00000048246 gex 0.10
5.427430 4.415346e-04 CDK18 gex 0.01
5.073621 3.014405e-03 ENSMMUG00000057062 gex 0.01
4.816631 1.127576e-02 ENSMMUG00000006133 gex 0.01
4.746981 1.594606e-02 ENSMMUG00000064087 gex 0.01
Omitted 6 lines

hotspot_gex_umap


HotSpot analysis (Nir Yosef lab, PMID: 33951459) was used to identify a set of GEX (TCR) features that showed biased distributions in TCR (GEX) space. This plot shows the distribution of the top-scoring HotSpot features on the GEX UMAP 2D landscape. The features are ranked by adjusted P value (raw P value * number of comparisons). The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel.

Features are filtered based on correlation coefficient to reduce redundancy: if a feature has a correlation of >= 0.9 (the max_feature_correlation argument to conga.plotting.plot_hotspot_umap) to a previously plotted feature, that feature is skipped. Points are plotted in order of increasing feature score
Image source: emoryPair11Final_hotspot_combo_features_0.100_nbrs_gex_plot_umap_nbr_avg.png

hotspot_gex_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the GEX landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_gex' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are GEX clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=49 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie GEX features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the GEX features).


Image source: emoryPair11Final_hotspot_combo_features_0.100_nbrs_gex_plot_clustermap_nbr_avg.png

hotspot_tcr_umap


HotSpot analysis (Nir Yosef lab, PMID: 33951459) was used to identify a set of GEX (TCR) features that showed biased distributions in TCR (GEX) space. This plot shows the distribution of the top-scoring HotSpot features on the TCR UMAP 2D landscape. The features are ranked by adjusted P value (raw P value * number of comparisons). The raw scores for each feature are averaged over the K nearest neighbors (K is indicated in the lower right corner of each panel) for each clonotype. The min and max nbr-averaged scores are shown in the upper corners of each panel.

Features are filtered based on correlation coefficient to reduce redundancy: if a feature has a correlation of >= 0.9 (the max_feature_correlation argument to conga.plotting.plot_hotspot_umap) to a previously plotted feature, that feature is skipped. Points are plotted in order of increasing feature score
Image source: emoryPair11Final_hotspot_combo_features_0.100_nbrs_tcr_plot_umap_nbr_avg.png

hotspot_tcr_clustermap


This plot shows the distribution of significant features from graph-vs-features or HotSpot analysis plotted across the TCR landscape. Rows are features and columns are individual clonotypes. Columns are ordered by hierarchical clustering (if a dendrogram is present above the heatmap) or by a 1D UMAP projection (used for very large datasets or if 'X_pca_tcr' is not present in adata.obsm_keys()). Rows are ordered by hierarchical clustering with a correlation metric.

The row colors to the left of the heatmap show the feature type (blue=TCR, orange=GEX). The row colors to the left of those indicate the strength of the graph-vs-feature correlation (also included in the feature labels to the right of the heatmap; keep in mind that highly significant P values for some features may shift the colorscale so everything else looks dark blue).

The column colors above the heatmap are TCR clusters (and TCR V/J genes if plotting against the TCR landscape). The text above the column colors provides more info.

Feature scores are Z-score normalized and then averaged over the K=49 nearest neighbors (0 means no nbr-averaging).

The 'coolwarm' colormap is centered at Z=0.

Since features of the same type (GEX or TCR) as the landscape and neighbor graph (ie TCR features) are more highly correlated over graph neighborhoods, their neighbor-averaged scores will show more extreme variation. For this reason, the nbr-averaged scores for these features from the same modality as the landscape itself are downscaled by a factor of rescale_factor_for_self_features=0.33.

The colormap in the top left is for the Z-score normalized, neighbor-averaged scores (multiply by 3.03 to get the color scores for the TCR features).


Image source: emoryPair11Final_hotspot_combo_features_0.100_nbrs_tcr_plot_clustermap_nbr_avg.png